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—— Abstract 


Oblivious RAM (ORAM) is a machinery that protects any RAM from leaking information about 
its secret input by observing only the access pattern. It is known that every ORAM must incur a 
logarithmic overhead compared to the non-oblivious RAM. In fact, even the seemingly weaker notion 
of differential obliviousness, which intuitively “protects” a single access by guaranteeing that the 
observed access pattern for every two “neighboring” logical access sequences satisfy (€, 6)-differential 
privacy, is subject to a logarithmic lower bound. 


In this work, we show that any Turing machine computation can be generically compiled into a 
differentially oblivious one with only doubly logarithmic overhead. More precisely, given a Turing 
machine that makes N transitions, the compiled Turing machine makes O(N - log log N) transitions 
in total and the physical head movements sequence satisfies (€, 6)-differential privacy (for a constant 
c and a negligible 6). We additionally show that Q(loglog N) overhead is necessary in a natural 
range of parameters (and in the balls and bins model). 


As a corollary, we show that there exist natural data structures such as stack and queues (sup- 
porting online operations) on N elements for which there is a differentially oblivious implementation 
on a Turing machine incurring amortized O(log log N) overhead per operation, while it is known 
that any oblivious implementation must consume Q(log N) operations unconditionally even on a 
RAM. Therefore, we obtain the first unconditional separation between obliviousness and differential 
obliviousness in the most natural setting of parameters where € is a constant and 6 is negligible. 
Before this work, such a separation was only known in the balls and bins model. Note that the lower 
bound applies in the RAM model while our upper bound is in the Turing machine model, making 
our separation stronger. 
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Differentially Oblivious Turing Machines 


M Introduction 


An oblivious RAM (ORAM), introduced in the seminal work of Goldreich and Ostrovsky [17, 
30, 18], is a tool for “encrypting” the access pattern of any RAM so that it looks “unrelated” 
to the underlying data. It is known that any ORAM scheme must incur at least O(log N) 
overhead, where N is the size of the memory. This was first shown by Goldreich and 
Ostrovsky [18] in the balls and bins model? and assuming that no cryptographic assumptions 
are used. More recently, Larsen and Nielsen [25] proved the same lower bound without the 
two aforementioned restrictions but requiring the ORAM to support operations arriving 
in an online manner. In fact, a follow-up work by Jacob et al. [23] showed that O(log N) 
overhead is necessary even for obliviously implementing very specific data structures (as 
defined in [43]) such as stacks, queues, and more. 


Apparently, logarithmic overhead is necessary even for implementing a RAM with a 
(seemingly) much weaker security guarantee than full obliviousness. Persiano and Yeo [34] 
considered the notion of differentially oblivious RAM,? a relaxation of ORAM that only 
protects individual operations by guaranteeing (e, 6)-differential privacy for the observed 
access pattern of the RAM (see Section 2.3 for a formal definition). Differential obliviousness 
was also studied in the context of specific functionalities by Chan et al. [7] and Beimel et 
al. [4]. It is shown that there are tasks for which obtaining differential obliviousness might be 
easier than full obliviousness. For instance, Chan et al. [7] show that there is a differentially 
oblivious algorithm for sorting N records according to a 1-bit key while maintaining the 
relative ordering of records with identical keys in time O(N - log log N),° while [26] showed 
a conditional Q(N - log N) lower bound for full fledged obliviousness in the balls and bins 
model. This leaves the following natural question open. 


Is there an unconditional separation between obliviousness and differential obliviousness? 


Let us remark that in the above question we are interested in the most standard models 
and range of parameters. For RAMs, we consider the standard word-RAM where each 
memory word is large enough to store its own logical address, where word-level addition and 
Boolean operations can be done in unit cost, and where the CPU has constant number of 
private registers. For (e, 6)-differential obliviousness, we want schemes that are secure for € 
being a fixed constant and 6 being a negligible function. For Turing machines, we allow an 
arbitrary number (which is fixed as part of the machine’s description) of one-dimensional 
bi-directional infinite work tapes, where in every step the head can moved left, right, or stay 
in place. 


This model assumes that each memory word as “indivisible” and restricts the ORAM to only move 
blocks around and not apply any non-trivial encoding of the underlying secret data; see Boyle and 
Naor [6]. 

Persiano and Yeo [34] called this notion differentially private RAM, but we prefer to use differentially 
oblivious RAM to (1) relate to the notion of oblivious RAM and stress that the goal is to preserve the 
physical access pattern’s privacy and (2) be aligned with previous work on the topic (Chan et al. [7]). 
Usually, differential privacy concerns the observed output of some algorithm. In our context, the output 
of an algorithm consists of the transcript of the computation: the physical memory accesses performed 
during the computation. 

Maintaining the relative ordering of records is called stability. Without stability, sorting records according 
to 1-bit keys is known to be doable (deterministically and obliviously) in linear time [2, 3]. 
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1.1 Our Results 


We present a large class of functionalities that can be made differentially oblivious with 
only O(log log N) overhead. The class includes many natural and useful algorithms and 
data structures such as stacks and queues and therefore implies an unconditional separation 
between obliviousness and differential obliviousness. 


> Theorem 1 (A separation; informal). There exists a data structure (e.g., a stack or a queue) 

supporting N operations for which: 

1. Any oblivious implementation (even on a RAM) requires Q(N -log N) operations; 

2. There is an (¢,6)-differentially oblivious two-tape Turing machine (defined below) that 
requires 


O(N - (log(1/e) + log log N + log log(1/6))) 


operations. 
In particular, letting « > 0 be a constant and 6 = 27 log? N (which is negligible), the 
number of operations incurred by the differentially oblivious machine is O(N - log log N). 


The above theorem follows from a much more general result about differentially oblivious 
Turing machines. Oblivious Turing machines were first introduced in 1979 by Pippenger 
and Fischer [35]. In this model, “memory accesses” correspond to the head’s movements 
throughout the execution of the algorithm (i.e., Left, Right, or Stay). Pippenger and Fischer 
showed how any multi-tape Turing machine can be obliviously simulated by a two-tape 
Turing machine with a logarithmic slowdown in running time. More precisely, any Turing 
machine that makes N steps can be simulated obliviously while consuming O(N - log N) 
steps. The simulation is deterministic and perfectly oblivious: the same sequence of head 
movements is observed for any two inputs. 

Adapting the notion of differential obliviousness to the Turing machine model, we show 
that any Turing machine that makes N steps can be simulated by a differentially oblivious 
machine while making only O(N - loglog N) steps. Here, neighboring sequences of head 
movements are ones where only one transition is different. For instance, the logical sequences 
of transitions {Left, Right, Left, Right} and {Left, Right, Right, Right} are neighboring. 


> Theorem 2 (A differentially oblivious Turing machine; see Theorem 11). For any €,6 > 0, any 
k-tape Turing machine that makes at most N steps can be simulated by an (€, 6)-differentially 
oblivious machine with max{2,k} tapes making O(N - (log(1/e) + log log N + log log(1/0))) 
steps. 


As above, letting € > 0 be a constant and 6 = 6(N) be a particular negligible function, the 
number of steps incurred by the differentially oblivious machine is O(N - loglog N). We 
note that the constant hidden in the O notation depends only on the description size of the 
given Turing machine (i.e., its alphabet size, number of tapes, etc). Let us remark that the 
number of tapes we use is essentially optimal since even without any security requirements 
simulating a k-tape Turing machine for k > 3 on a (k — 1)-tape one is not known to be 
possible with better than logarithmic overhead in steps (Hennie and Stearns [22]). Also, 
simulating a 2-tape machine on a single tape machine has polynomial overhead (Hartmanis 
and Stearns [20] for the upper bound and Hennie [21] for a lower bound). 

Theorem 1 follows from Theorem 2 as follows. Consider (for instance) the stack data 
structure on N elements, supporting (“online”) PUSH and Pop operations. By Theorem 2 
and using the fact that a stack can be implemented in linear time on a Turing machine, 
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there is an (e, 6)-differentially oblivious Turing machine implementing it whose overhead is 
O(log log N) for a constant € and negligible ô, as above. As mentioned, the logarithmic lower 
bound follows from Jacob et al. [23]. 

Lastly, we observe that a lower bound of Chan et al. [7] can be tweaked to show that our 
construction is essentially optimal by showing that Q(log log N) overhead is necessary for 
differential obliviousness in a natural range of parameters and in the balls and bins model. 


> Theorem 3 (A lower bound; see Theorem 13). There exists an algorithmic task for which 
there is a Turing machine that on input of size N completes it in O(N) steps. On the 
other hand, for anyO<s<VJ/N,€>0,0< 8 <1, and0 <6 < B- (e/s)-e-?**, any 
(e, 0)-differentially oblivious implementation in the balls and bins model (even on a RAM) 
for this task must consume Q(N -log s) steps with probability 1 — 6. 


In particular, for a constant € > 0 and s > log? N, we can set ô = Q-A(log* N) (which is 
negligible) and get that O(log s) = Q(log log N) overhead is necessary. Note that if we want 
ô = an then the lower bound above says that the best we can hope for is Q(log N) 
overhead. As mentioned, with logarithmic overhead we can actually get perfect obliviousness 
for any Turing machine [35]. 


1.2 Related Work 


Goldreich and Ostrovsky [17, 18] showed that any RAM that uses a memory of size N 
and makes T accesses, can be made oblivious using only O(T - poly log N) accesses. The 
resulting RAM is probabilistic and obliviousness holds against polynomial-time distinguishers 
assuming the existence of one-way functions. The concept of oblivious RAM has inspired an 
immense amount of research. One line of work, focuses on applications of such compilers 
to cryptography and security, including applications in cloud computing, secure processor 
design, multi-party computation, and more (for example, [31, 38, 39, 5, 14, 36, 29, 15, 
42, 16, 27, 45, 28, 44]). Another line of work, focuses on improving the overhead of the 
compiler [37, 24, 19, 8, 40, 41]. Only recently, a couple of works [32, 2] have resolved the 
problem by presenting a compiler whose overhead is O(log N) (while still relying on one-way 
functions). 

Patel et al. [33] considered the natural question of what kind of security can one hope for 
while limiting the overhead of a RAM simulation to constant. They show a construction 
of an (e,0)-differentially oblivious RAM with O(1) overhead for €e = O(log N) and also 
assuming that the client can store w(log N) records. They also proved a lower bound which 
quantitatively improves upon the one of [34] in the dependence on € but is qualitatively 
worse since it is in the balls and bins model. Throughout this work, we focus on the setting 
where e€ is a fixed constant and also that the client’s storage is a constant number of blocks. 

The work of Pippenger and Fischer [35] came in a long line of works trying to pin down 
the exact relation between various different computational models. One notable work is that 
of Hennie and Stearns [22] who showed that any multi-tape Turing machine can be simulated 
by a two-tape machine with logarithmic overhead. Pippenger and Fischer’s result can be 
viewed as a similar compiler except that their resulting machine is also oblivious. Note that 
the result of [22] is the reason why one should not hope to improve the number of tapes in 
the resulting machine in Theorem 2 to two (as this task, even without privacy, is not known 
to be possible with less than logarithmic overhead). Simulating a 2-tape Turing machine on 
a single tape machine requires polynomial overhead due to Hartmanis and Stearns [20] and 
Hennie [21]. 
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Some of our ideas in the differentially oblivious Turing machine construction are reminis- 
cent of the aforementioned differentially oblivious algorithm (in the RAM model) for stable 
tight compaction due to Chan et al. [7]. Technically, their algorithm uses similar tools from 
the differential privacy literature (namely, differentially private prefix sums due to Chan et 
al. and Dwork et al. [9, 10, 12]) but the way they use it differ in nature from our approach. 
Partly, this is because our target machine is a Turing machine rather than a RAM, and 
therefore, standard building blocks such as oblivious sorting (which they use) are inapplicable. 
Second, even if we allow compiling a Turing machine to a differentially oblivious RAM (rather 
than insisting on Turing machine as the target machine), we still cannot directly use their 
techniques for constructing stable tight compaction because their techniques which rely on 
oblivious sorting are offline in nature; and thus not compatible with the online nature of our 
differentially oblivious simulation. 


1.3 Technical Roadmap 


In this overview we will focus on simulating a Turing machine with a single tape for N steps. 
There are many complications and technical difficulties that arise in the multi-tape case, but 
we refer to the technical sections for details. 

Given a Turing machine our goal is to hide the location of the head during the execution 
of the machine, in a differentially private manner. To this end, we first develop an efficiently 
differentially private algorithm for estimating the location of the head at pre-defined points 
in time. Naively, we could add a fresh Laplacian noise every time we need an estimate, but 
this will incur at least VN loss in the privacy budget (by standard composition theorems). 

To get around this, inspired by the work of Chan et al. [7], we use a differentially private 
prefix sum algorithm [9, 10, 12] to account for the location of the head. Recall that in the 
prefix sum algorithm, a stream of number arrives in an online manner and the algorithm 
outputs the sum of all number seen so far, after seeing every number. We set up the numbers 
to correspond to head movements (“Left” for -1 and “Right” for 1) and show that this 
approach incurs only poly log N loss in privacy budget, which is good enough in terms of 
privacy. One challenge that we run into is that we need to implement the differentially private 
prefix sum algorithm on a Turing machine. It turns out that every time we need to get an 
estimate of the head’s location (i.e., get a prefix sum), we need to pay some non-trivial factor 
in running time and so we need to minimize the number of such estimations. Therefore, we 
design our algorithm to work with only one estimate of the head’s location every poly log N 
steps and amortize the cost of this estimation while processing the next poly log N steps of 
the Turing machine. 

Once we have a good-enough estimate of the head’s location every poly log N steps, all 
that is left is to copy the nearby positions to a smaller oblivious Turing machine which we 
use to simulate the next polylog N steps. We set up the parameters in such a way that 
we copy enough positions around the estimated head’s location to actually include the real 
head position along with the relevant tape around it to perform the next polylog N steps 
so the above is well defined. The oblivious Turing machine that we need must provide an 
“initialization” procedure that allow us to start an oblivious Turing machine from an existing 
memory, and a “destruction” procedure which allows us to extract the memory to its original 
structure in the end of the execution. Pippenger and Fischer’s [35] construction does not 
provide such procedures so we describe a variant that does. As an independent contribution, 
our new oblivious Turing Machine is described in a language that more closely resembles the 
hierarchical oblivious RAM construction of Goldreich and Ostrovsky [17, 18] so it might be 
easier to understand for those who are familiar with the latter. 
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Lastly, let us remark that the above description was very high level and glossed over 
many technical details. For example, one technicality arises because we have to use the 
tapes sparingly in the compiled Turing machine to get a theorem that is tight in the number 
of tapes. To achieve this, we delevop algorithmic tricks that allow us to reuse the same 
tape for multiple purposes without without incurring any overhead in asymptotic running 
time. Another technical challenge arises because unlike earlier explorations on differentially 
obliviousness [7, 4], our target machine is a Turing machine rather than a RAM. This imposes 
additional constraints for our algorithm design, since we cannot use common building blocks 
such as oblivious sorting. Moreover, the online nature of our differentially oblivious simulation 
also renders some previous building blocks inadequent (which we discuss more in the Related 
Work section). We refer to the technical sections for details. 


[2 Preliminaries 


For an integer n € N we denote by [n] the set {1,...,n}. A function negl: N > Rt is 
negligible if for every constant c > 0 there exists an integer N, such that negl(A) < A~° for 
all A > Ne. 


2.1 Turing Machines 


We follow the presentation of Arora and Barak [1] for the definition of a k-tape Turing 
machine. A tape is an infinite bi-directional line of cells, each of which can hold a symbol from 
a finite set called the alphabet. Each tape is associated to a tape head that can potentially 
read or write symbols to the tape one cell at a time. The machine’s computation is divided 
into discrete time steps, and the head can either stay in place or move left or right one cell 
in each step. More formally, a Turing machine M is described by a tuple (T,Q,6), where 
T is a set of symbols that M’s tapes can contain, Q is the set of M’s possible states, and 
ô: Q xT! +QxI* x {L, S, R}! is M’s transition function. 

If the machine is in state q € Q and (01, 02,...,0%) are the symbols currently being read 
in the k tapes, and 6(q, (o1,.-.,7%)) = (d,(o1,..-,0%), 2), where z € {L, S, R}*, then at the 
next step the ø symbols in the k tapes will be replaced by the o’ symbols, the machine will 
be in state q’, and the k heads will move Left, Right, Stay in place, as given by z. There are 
additionally a read-only tape for the input and a write-only tape for the output, and perhaps 
a randomness tape if needed, but we ignore those when counting the number of tapes and 
only account for the work tapes. 

The definition above is quite robust to the choices one makes regarding the alphabet size, 
the number of tapes, etc, since they are all equivalent in terms of complexity up to small 
factors. We recall the known facts which can be found, for example, in Arora and Barak [1]. 


> Fact 4. It holds that: 

1. Every function f that is computable in time T using alphabet T, can be computed in time 
O(log |I|- T) using an alphabet of size O(1). 

2. Every function f that is computable in time T using k tapes, can be computed in time 
O(k-T?) on a single tape machine and in time O(k-T -logT) on a two-tape machine. 

3. Every function f that is computable in time T using k bi-directional tapes, can be computed 
in time O(T) using k standard (uni-directional) tapes. 


4. Every function f that is computable in time T using k tapes, can be computed in time 
k-T using k tapes such that in each step only one of the tapes moves. 
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We mention the dependence on k in the above terms for explicitness even though it is a 
constant. 

In this work, we care about logarithmic factors so, by default, our Turing machine model 
is that of a two-tape machine. By the above, it does not matter if we consider uni-directional 
or bi-directional tapes. Constant factors in the alphabet size do not matter as well. All of 
the above only affect the constants which are hidden inside the O notation. 


2.2 Differential Privacy 


Differential privacy, introduced by Dwork et al. [11], is a property of algorithms that, very 
roughly, guarantees “security” for a single record in the input. Namely, if the algorithm 
acts on the information of a set of individuals, from the output it is hard to decide whether 
a particular individual’s information was used in the computation. This is formalized as 
follows. Let A be a probabilistic algorithm that takes as input a dataset. Let Im(A) be the 
set of all possible outputs of A. The algorithm A is said to be (e, 6)-differentially private if 
for all datasets Do and D4, that differ only on one entry, and all possible subsets S C Im(A), 
it holds that 


Pr[A(Do) € S] < e° - Pr[A(D1) € S] + ô, 


where e is the base of the natural logarithm. 
We refer to Dwork and Roth [13] for more information on differential privacy. 


2.3 (Differentially) Oblivious Turing Machines 


Obliviousness. Obliviousness is nowadays usually defined for RAMs and it guarantees 
that the access pattern of the RAM is “independent” 
specifically, given a RAM M and an input I, we consider a random variable Accesses( M, I) 
that corresponds to the ordered sequence of memory locations M accesses during an execution 
on input I. We then require that the distribution of Accesses( M, I4) is indistinguishable from 
Accesses( W, I2) for any I, and Iz of the same length. The precise notion of indistinguishability 
can be either computational, statistical, or perfect, depending on the context. 

A Turing machine can be thought of as a restricted version of RAM where random accesses 
are not allowed but any two consecutive accessed addresses must be to adjacent locations 
(i.e., the head can move at most one cell at a time). Adapting the notion of obliviousness to 
the Turing machine model requires that the tape’s head movements during the execution of 
the algorithm to not leak information about the inputs. Again one can define various notions 
of obliviousness, including computation, statistical, or perfect. We consider the strong notion 


of the underlying input. More 


of deterministic perfect obliviousness. 


> Definition 5 (Oblivious Turing machine). A Turing machine M is said to be oblivious if 
for every input x € {0,1}* andi € [N], the location of each of M’s heads at the ith step of 
execution on input x is only a function of |x| and i. 


Differential obliviousness. Differential obliviousness was introduced by Chan et al. [7] as 
a relaxation of obliviousness for RAMs. At a high level, this security notion only protects 
individual operations, rather than the whole sequence of operations. This is formalized by 
requiring (e€, 6)-differential privacy for the observed access pattern of the RAM. In this case, 
two sequences of accesses Ip and I; are neighboring if they are of the same length and differ 
in exactly one location accessed. 
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When we adapt the notion to the Turing machine model, one needs to distinguish between 
the input of the computation and the induced sequence of head movements that this input 
causes. While in some cases the two are analogous (which is the case in some of our results), 
in other cases there might be a gap. Namely, there are cases where the inputs are very close 
(in say, Hamming distance) and yet the resulting head movements are far from each other. 
Nevertheless, defining neighboring inputs w.r.t the observed sequence of head movements, 
as we do next, still implies a privacy guarantee for the inputs using standard group privacy 
theorems [11]. 


> Definition 6 (Neighboring inputs). Let M be a k-tape Turing machine. For J € {0,1}*, let 
Movements(M, J) € ({L, R,S}*)* be the sequence of head movements that M does on input 
J. Two inputs Jo, Jı € {0,1}* are called neighboring if 

1. |Movements(M, Jo)| = |Movements(M,J1)| and 

2. Movements(M, Jo) and Movements(M, J1) differ in exactly one location. 

That is, letting (€)",...,02"),...,(@7%,..., a2") = Movements(M, Jy) for b € {0,1}, there 
is exactly one pair (i*,j*) € [N] x [k] such that pr Æ oe while for all other (i,j) € 
[N] x [k] \ {(é*,3*)}, it holds that 5" = 65". 


Given this notion of neighboring inputs, we give the definition of differential obliviousness. 


> Definition 7 ((e, 6)-differentially oblivious Turing machine). A Turing machine M satis- 
fies (€,6)-differential privacy iff for any neighboring inputs Jo and Jı and any set S € 
({L, R, S}*)* of possible sequence of head movements, it holds that 


Pr [Movements(M, Jo) € S] < e - Pr [Movements( M, J1)) € S] + ô. 


[3] Estimating Heads’ Locations 


In this section, we present an algorithm running on a Turing machine that outputs estimates 
to the location of the heads in a Turing machine computation. More precisely, the input to 
the algorithm is a sequence of movements of the heads of the machine (i.e., Left, Right or 
Stay for each tape), and it outputs an estimate to the location of the head in a-priori fixed 
intervals of time in an online fashion. The algorithm (1) outputs an estimate which is not 
too far from the true position of the head, (2) the estimates is differentially private, (3) the 
algorithm’s head movements themselves are oblivious (i.e., data-independent), and (4) the 
algorithm is very efficient. The intervals at which we output an estimate on the location of 
the heads are denoted p. 


> Theorem 8. There exists an algorithm EstimateHead..5 such that for any e, > 0, the 
following holds. Fix any stream a = a1,a2,...,ay € {L,S,R}* that corresponds to the 
movements of the heads of a k-tape Turing machine. Let oi = i a; fori € [N/p] (i.e., 
the true position of the heads every p steps). Let {õi}iciN/p) denote a possible output of the 
algorithm EstimateHead,,5 when fed a as an input in an online fashion. It holds that: 
1. Utility: With probability 1 over the randomness of EstimateHead,,5, it holds that 


ss 1.5 
ae lõi — a;l] € O ((1/e) - log”? N - log(1/6)) . 

2. Differential privacy: EstimateHead..5 is (€,6)-differentially private. Here, neighboring 
sequences are defined in the natural way by allowing only one of the k indices in one of 
the N a;’s to differ between the two sequences. 

3. Obliviousness: The algorithm itself is perfectly oblivious. 

4. Efficiency: The algorithm runs in time O (k - N+k-(N/p) - (log N)). 
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Proof. The algorithm builds on the differentially private prefix-sum algorithm of Chan et 
al. [9, 10] and Dwork et al. [12]. Their algorithms address the problem of continuously 
estimating the prefix sums of elements in a given stream of numbers while maintaining 
differential privacy. We follow the presentation of these algorithm from Dwork and Roth [13, 
§12.3]. The algorithm is given a stream of numbers b = 0, bo,...,bn E {—1,0,1} that 
the algorithm sees in an online fashion. The algorithm outputs, after seeing b),...,b; an 
approximation of 5S bi. This task is almost what we need to prove our theorem for k = 1 
(i.e., the machine has one tape). Indeed, a movement left (resp. right) can be interpreted as 
-1 (resp. 1) and staying in place corresponds to seeing 0. The location of the head is exactly 
the sum of those numbers. Additionally, it is not hard to observe that their algorithm is in 
fact oblivious (see below). Nevertheless, the running time of their algorithm on a Turing 
machine is O(N -log N) (see below). So, we need to (1) extend the algorithm to handle any 
k > 1 tapes and (2) show how to implement it in the specified running time on a Turing 
machine. Since both goals are somewhat non-trivial to achieve, let us first briefly recall their 
algorithm and state its guarantees, and then describe our modifications. 

Assume that N is a power of 2 (for simplicity and without loss of generality). We associate 
the N numbers to leaves of a full binary tree and then label each node in the tree with an 
“interval”. The ith leaf (for i € [N]) is labeled with fi, i]. An internal node is labeled with the 
interval that is the union of the intervals associated with its children. Now, with each node, 
labeled [s, t] in this tree, we associate a noisy count that approximates the sum of the values 
seen in positions s,s+1,...,¢ by adding noise from the appropriate distribution. In [9, 10, 12] 
the added noise was sampled from Lap((1+log, N)/e), where Lap(s) denotes the (continuous) 
Laplace distribution with mean 0 and variance 2s”. To output 6; (i.e., the approximation 
of 4 aj), we write į in binary to find at most logy N intervals whose union is [1,t], and 
compute the sum of the corresponding noisy counts. These intervals are associated to the 
nodes which are called the frontier. This algorithm satisfies (€,0)-differential privacy and 
satisfies the following utility property: With probability 1 — ô over the randomness of the 
algorithm, 

max |a; —0;| < O ((1/e) -log N. log(1/6)) . 

ic[N/p] 

It is easy to turn the utility property to be satisfied with probability 1 by outputting the 
exact prefix sum in the clear whenever the error in the output is too large. This causes the 
algorithm to be (e, 6/2)-differentially private, as needed. 


Handling multiple tapes. We extend the algorithm to handle k tapes by maintaining k 
prefix sums computed in parallel. This clearly does not hurt utility or obliviousness and only 
incurs a k factor in running time. However, naively, it incurs a k factor in differential privacy. 
Nevertheless, we observe that considering any two neighboring sequences of inputs, k — 1 of 
the tapes will have the exact same access pattern while only one will differ in one position, 
and so this extension, in fact, does not incur a k factor in differential privacy. 


Running time. The main challenge is to maintain an updated version of the noisy counts 
associated to the nodes in the frontier. Recall that the frontier is of size log, N + 1. Naively, 
with the above algorithm, computing the frontier at time 7+ 1 from the frontier at time i 
may cost up to O(log N) work which is too expensive for us. However, recall that we do not 
need a prefix sum after every a;, but rather we want to output one after every p inputs. So, 
instead of having a full binary tree where the leaves correspond to each input, we consider a 
full binary tree where each leaf corresponds to a sequence of p inputs and it is labeled by 
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their sum. The depth of this tree is log,(V/p) and the point is that we need to compute the 
“next” frontier (which costs about log, N) only once every p operations, so the total cost is 
O(k-(N/p) - log N) plus the time it takes to aggregate the sum itself which is O(k - N), as 
needed. < 


p> Remark 9 (Sampling from Lap). We emphasize that the above algorithm assumes that 
a Turing machine is capable of sampling from Lap(-) in O(1) time. This is assumed for 
simplicity of presentation. However, it is possible to efficiently compute an estimate of this 
distribution on a standard Turing machine. The cost of this approximation is small good 
enough to obtain (asymptotically) the same final result in Theorem 11. Therefore, the 
assumption being made in this section is without loss of generality in the context of our main 
result. See details in Appendix A. 


[W Oblivious Turing Machines 


A classical result by Pippenger and Fischer [35] shows that any Turing machine computation 
can be made perfectly oblivious (i.e., Definition 5) on a two-tape machine with amortized 
logarithmic overhead. More precisely, any Turing machine that makes at most N steps can 
be made perfectly oblivious while making O(N - log N) steps. 

In our application we need an oblivious Turing machine which support two additional 
properties. The first, called “initialization”, is that one can initialize an oblivious Turing 
machine with a given memory (as opposed to starting off with an empty memory). The 
second, called “destruction”, returns the state of the memory in a linear fashion. 

Our construction is similar in spirit to the one of Pippenger and Fischer [35]. However, 
we present it in a language that more closely resembles the hierarchical oblivious RAM 
construction of Goldreich and Ostrovsky [17, 18] so it might be easier to understand for 
those who are familiar with the latter. 


> Theorem 10 (Oblivious Turing machine, revisited). Any k-tape Turing machine that makes at 
most N steps can be executed obliviously on a two-tape Turing machine with O(N) space and 
with O(k- N -log N) steps. Additionally, the machine supports initialization and destruction. 


Proof. We will present the main idea in the special case where the given Turing machine 
has only a single tape and the resulting machine will have £ = [log N] tapes. Later, we will 
explain how to handle multiple tape machines in the input and simulate them obliviously 
with just two tapes (at the same cost). 

We have £ tapes and sometimes we will refer to these tapes as “levels” (analogously to 
levels in [18]’s hierarchical ORAM construction). For each i > 1, level T; is a tape that 
contains at most 4; = 2 — 1 elements and it is thought of as a cyclic buffer. That is, the 
element on the right of the (2f — 1)th element in level T; is the 1st one. Our construction 
will maintain the following invariant. Let p be the pointer to the current head location of 
the original Turing machine. At the end of every 2f steps, T; stores the content of the tape 
at positions |p — 2°71 : p + 2571]. 

According to this, level T; will always store the content of the cell pointed to by the head, 
level Ty stores the content of cells p — 1, p,p + 1, and so on. Notice that the same cell may 
be part of several levels and not all of the values will be consistent with each other. The 
freshest copy of a cell is always in the T; with the smallest i that contains the cell. Reading 
the value of the cell pointed to by the head or writing to that cell is done by reading or 
writing (respectively) directly to T}. 
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For every i > 0, at the end of every 2’ steps, the following reorganization steps are 
performed: 

1. T;41 writes its updated contents to T;,2. This is done by making a pass over T;,2 and 
scanning over Tj41 as a cycle buffer, making a real write whenever needed and making a 
dummy write otherwise. 

2. T;11 copies the corresponding segment it ought to store from T;42. This is done, as above, 
by making a pass over T;+2 and scanning over 7;41 as a cycle buffer, making a real write 
whenever needed and making a dummy write otherwise. 


Correctness follows immediately by description. Perfect obliviousness follows from the 
fact that the head’s movements are deterministic and a-priori fixed. For efficiency, consider 
any sequence of N steps. A read or a write are done at a single operations cost by just 
accessing Tı. It remains to account for the cost of the reorganization steps. Note that the 
N operations are confined to levels T; for i < log N + 2. By description and recalling that 
the size of level T; is 2‘ — 1, the total amount of steps performed by the oblivious Turing 
machine is bounded by 


log N+2 N 
5 ES -O(|Ti+2|) € O(N - log N). 


i=1 


We now explain how to remove the simplifying assumptions we had, the first being that 
the input machine has only one tape and the second being that the resulting oblivious 
machine uses log N many tapes. Let us first handle the former, letting k be the number of 
tapes used by the input machine. We use an encoding trick. We encode the k tapes into a 
single tape by first placing all the first cells from each tape, then the second cell, and so on. 
Each “track” will have its own head marker. By the construction of the oblivious Turing 
machine, all the tracks can be processed simultaneously (recall that our head movement 
sequence is deterministic), incurring a k multiplicative factor. 

Now, we explain how to modify our Turing machine to use only two tapes. As a first step, 
let us place the different levels one after the other on the single tape. Naively, this incurs a 
blowup in running time due to the reorganization steps. Indeed, in the reorganization steps, 
we need to scan two levels “in parallel” as cyclic buffers. The only way to do this with a 
single tape is by moving back and forth in the tape which is too expensive. This is where 
we will use the second tape. When such a “parallel” scan is needed, we will copy one of the 
levels to the second tape, do the “parallel” scan by scanning both tapes in parallel, and then 
copy it back. This only incurs a constant overhead. 


Initialization and destruction. In our application, we will need to an oblivious Turing 
machine with two additional features so we explain how to implement them next. The first 
is that we need to support initialization with a given memory which might not necessarily 
be empty. We implement this by starting with an empty memory, as described above, and 
modifying the memory one element at a time. If the number of steps that we eventually 
perform on the Turing machine is about the size of the initial memory, the cost of this step 
will be amortized away. 

The second feature is a destruction procedure which outputs the memory at the end of 
the computation in a linear fashion. This is not so immediate since our construction does 
not store the memory in a linear fashion. Recall that our construction satisfies that at the 
end of every 2’ steps, the cells corresponding to the level T; store the content of the tape at 
positions [p — 2°71 : p+ 2°71]. This means that if “destruct” is invoked after a power-of-2 
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many steps, the memory is stored exactly in the cells corresponding to some level and one 
can make one linear scan to extract those elements and put them one next to the other. 
If destruct is invoked after some other number of steps, we need to modify this procedure 
slightly by collecting the most updated memory values of each cell from the appropriate level 
(now, the most updated values are spread amongst different cells). This again can be done 
by a single scan. < 


‘5 A Differentially Oblivious Turing Machine 


In this section, we describe our transformation from any Turing machine into a differentially 
oblivious one. 


> Theorem 11. For any €,6 > 0, any k-tape Turing machine that makes at most N steps 
and consumes S space can be transformed into an (e€, 6)-differentially oblivious max{2, k}-tape 
Turing machine that makes O(N - (log(1/e) + loglog N + loglog(1/6))) steps and consumes 
O(S + (1/e) -log? N - log(1/5)) space. 


The construction of the differentially oblivious Turing machine uses the oblivious Turing 
machine construction from Section 4 and the head’s location estimation algorithm from 
Theorem 3. We will present the construction in steps. We first assume that the input 
machine uses only one tape and the resulting machine will use many tapes. Then, we will 
explain how to get rid of both simplifying assumptions and therefore obtain Theorem 11. 


5.1 From One Tape to Four Tapes 


Assume first that the given machine, M, uses only a single tape. We first present a 
construction that compiles M into a differentially oblivious Turing machine dpM with 4 
tapes. Fix €, ô > 0 for the rest of this section. 


Tape allocation. The resulting Turing machine, dpM, will consist of four tapes, numbered 

1, 2, 3, and 4, in the following order: 

1. One tape to simulate the input Turing machine computation (recall that we assumed 
that the input machine has only one tape). 

2. Two tapes for running an oblivious Turing machine (according to Section 4). 

3. One tape to compute differentially private head’s location estimation algorithm (according 
to Section 3). 


The algorithm. As mentioned, we use the oblivious Turing machine implementation from 
Section 4 but since its overhead is logarithmic (in the running time of the non-oblivious 
machine), we do not want to apply it directly on our machine. Instead, we are going to break 
down the computation of the original machine into epochs and invoke the oblivious machine 
only within epochs. Concretely, we split the computation of M into epochs of size 


p = (1/e) -log? N - log(1/6). 


Each such epoch will be executed in its own “fresh” oblivious Turing machine and so the 
overhead will only be a doubly logarithmic factor in N. Next, we explain how dpM works. 
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E Algorithm dpMe,5. 


1. Set h2 = 0 be the initial approximate position of the head (it is equal to the real 
position). 

2. Break the T-step computation into epochs of p steps of computation. For epoch 
i=1,...,N/p, do: 

a. Copy an area of size 4p + 1 around hiz!, namely [htt — 2p, hig! + 2p] to the 
oblivious Turing machine (Theorem 10). Perform the next p steps of computation 
there. At the end of the epoch, copy the state of these 4p + 1 cells back to the 
main tape. 

b. In parallel, keep track of the movements of the head and count the offset of the 
head compared to the previous location, hig!. At the end of the epoch, invoke 
the differentially private head’s location estimation algorithm (Theorem 8) with 
privacy parameters € and 6 to update the location of the head hi. 


> Theorem 12. For any €,6 > 0 and given any single-tape Turing machine M that makes 
at most N steps and consumes S space, the 4-tape machine dpM,,5 is (€,6)-differentially 
oblivious, makes at most O(N - (log(1/e) + loglog N + loglog(1/6))) steps and consumes 
O(S + (1/e) - log? N - log(1/5)) space. 


Proof. We first prove correctness, ignoring obliviousness. Consider any sequence of operations. 
At any point in time, the oblivious Turing machine contains 2p+1 memory cells and performs 
all necessary operations within. For correctness, by description, it is enough to show that 
the p operations are indeed contained within those 2p + 1 cells. Indeed, for this to hold it is 
enough to argue that h& is close enough to the real location of the head: hi, — 2p < h’ — p 
and hi, +2p > ht + p, where h’ is the true location of the head. In other words, we need to 
show that 


Iha- h'| < p. 


By the utility property of the head’s location estimation algorithm (Theorem 8), we know 
that |hi, — h*| < (1/€) - log’? N - log(1/6) (this is the upper bound on the additive error 
of each each head’s location estimation for every i). Now, the above inequality follows by 
recalling that p = (1/e) - log” N - log(1/6). 

To prove (e, 6)-differential obliviousness, consider any two sequences of operations Ip 
and I, that differ at one operation. Consider the random variable f, corresponding to the 
physical tape heads locations on input I, for b € {0,1}. Say the two sequences Ip and I, differ 
in the ith operation and are otherwise identical. Then, the first į — 1 operations result with 
identical distributions of head locations in both executions (as all the underlying building 
blocks are perfectly oblivious). The only difference is in the ith operation. There, the 
head’s locations might differ due to a different distribution of the head’s location estimation 
algorithm (Theorem 8). However, we are guaranteed that this algorithm is (€, })-differentially 
private. The rest of the heads’ movements are perfectly oblivious: the oblivious Turing 
machine is perfectly oblivious, the head’s location estimation algorithm itself is perfectly 
oblivious, and the other operations that we do in the implementation of dpM are trivially 
oblivious. 

Lastly, we analyze efficiency by counting the total amount of work and space required to 
handle any sequence N operations that consume S space. Step 2a costs O(p- log p) operations 
and space due to Theorem 10 (the rest of the operations can be implemented in O(p) time 
and space). Computing the differentially private head’s location estimation in Step 2b takes 
overall O(N + (N/p) - (log N + log(1/6)) < O(N) time due to Theorem 8 (i.e., constant 
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amortized work per access). Otherwise, simulating the original computation and accounting 
for the location of the head, requires O(N) work and space. Overall, over N operations, 
the total space is O(N + p-logp) and the work is bounded by O(N - logp). Plugging in 
p = (1/e) - log? N - log(1/6) completes the proof. < 


5.2 From One Tape to Two Tapes 


In this section we show how to obtain the same result as in the previous section (i.e., 
Theorem 12), except that our resulting Turing machine will only use two tapes (instead of 
four). One tape will be used for the simulation of the original Turing machine plus one of 
the tapes of the oblivious Turing machine and the other tape will be used to perform the 
head’s location estimation algorithm and the other tape of the oblivious Turing machine. 

Recall that the tapes used for the oblivious Turing machine, both consume about p space, 
but one interacts with the main tape (call it tape oTM1) and the other acts as a scratch 
pad and the values that are written there are never accessed outside of the oblivious Turing 
machine implementation (call it tape oTM2). We will merge tape oT M2 into the tape that 
simulates the original computation, and tape oTM; to the tape that computes the prefix 
sums. 


Tape 1 (Main Tape). This tape will consist of the main computation tape as well as a 
blank area which is used for tape oT Mə of the oblivious Turing machine. We use the fact 
that the required space for the oblivious Turing machine is O(p) cells and so we will maintain 
such a “space of blanks” which will not be too far from the real position of the head and 
will be used whenever a new oblivious Turing machine is instantiated. Let us denote by 
Sot = O(p) the space consumption of the oblivious Turing machine. Our first tape, the one 
that simulates the computation of the original Turing machine, will maintain the invariant 
that in distance S,tm from the location of the approximate head hx to the right, there are 
Sotm blank cells (the last blank cell has distance 2Sorm from hy). If this invariant holds, 
then whenever an epoch begins, we can move to the blank area and use it as the oblivious 
Turing machine tape. At the end of the epoch, we can go back to where we were. Since we 
perform p operations inside the oblivious Turing machine, the amortized cost of moving back 
and forth is O(1) per operation which is what we need. 

We thus need to explain how to maintain the above invariant. The idea is to move the 
blank area together with the location of the head once every epoch. Namely, once we update 
the approximate position of the head hz, we will also move the blank area appropriately so 
that its distance from the new hy is as we require. Moving the blank area, as above, can 
be done simply in time O(p) using a designated size O(p) space in the other tape (tape 2) — 
this is done by moving the area that needs to go to the blank area to tape 2 (and replacing 
it with blanks), and then copying by moving both heads “in parallel”. 


Tape 2 (Secondary Tape). This tape will consist of three areas, each of size O(p). One of 
these areas will be used for moving the blank area in tape 1, as we explained above. Another 
area is for the computation of the head’s location estimation — this algorithm has state of 
size O(log N) < O(p) (which contains a frontier of a tree of noisy sums per interval). The 
third part is for tape oTM; of the oblivious Turing machine (which also uses O(p) space). 

The first and second parts in this tape are accessed at the end of every epoch and some 
computation of length O(p) is performed on each of them (either updating the prefix sum or 
moving data around). Since each epoch handles O(p) operations, the amortized cost of this 
part is O(1) per operation of the original machine. The third part, in contrast, is accessed 
throughout the epoch, and there we get O(log p) overhead per operation. 
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5.3 From k Tapes to k Tapes (for k > 2) 


This extension is done by making two changes. First we use k tapes to simulate the 
computation of the original k-tape machine (instead of just a single tape). Second, we use the 
algorithm for estimating the head’s position which works for k tape machines (Theorem 8) — 
this incurs an overhead of k operations per step. Recall that this algorithm just runs the 
algorithm for estimating the head’s position of a single tape k times (independently). It 
remains to explain where we execute this algorithm and also where we execute the oblivious 
Turing machine since now we do not have an extra work tape. 

The idea is to first modify each tape to have two “blank areas”, each as above. Say that 
one blank area will be on the left of the head’s position and the other one on the right. The 
one on the left will act as the “Main Tape” in the above construction and the one on the 
right as the “Secondary Tape” for another tape. Concretely, tape (i + 1) mod k acts as the 
“Secondary Tape” of tape i (for all i € [k]). 

That is, the blank area on the left of the head of tape i, is used to simulate the computation 
of tape i in the original machine and also to execute tape oT Mg of the oblivious Turing 
machine when simulating tape i (this is exactly the same usage of the blank area as above). 
The blank area on the right of the head of tape i consists of three areas: (1) an area used to 
maintain the blank areas in tape (i + 1) mod k, (2) an area used for the computation of the 
head’s location estimation of tape (i +1) mod k, and (3) tape oTM; of the oblivious Turing 
machine when simulating tape (i + 1) mod k. Overall, these changes incur an extra O(k) 
factor in the overhead of the simulation. 


‘6 Lower Bound 


In this section we prove that our differentially oblivious Turing machine is optimal in terms 
of overhead in a natural range of parameters. Specifically, we prove the following theorem. 


> Theorem 13. There exists an algorithmic task for which there is a Turing machine that on 
input of size N completes it in O(N) steps. On the other hand, for anyO<s< VN, €>0, 
0<6 <1, and0 <6 < B- (e/s): es, any (€,5)-differentially oblivious implementation 
(even on a RAM and in the balls and bins model) for this task must consume Q(N - log s) 
steps with probability 1 — p. 


Proof. In the work of Chan et al. [7] the following theorem concerning the required overhead 
to stably sort a set of balls according to associated 1-bit keys while maintaining differential 
obliviousness. Here, we assume that the balls are opaque and so no non-trivial encoding on 
them can be done [6]. 


> Theorem 14 (Theorem 4.7 in [7]). Let 0 < s < VN. Suppose e > 0,0 < 6 <1, and 
0<6< B-(e/s)-e-7°8. Then, any (even randomized) stable sorting algorithm for balls 
according to associated 1-bit keys in the RAM model that is (e,6)-differentially oblivious must 
have some input, on which it incurs at least Q(N - logs) memory accesses with probability at 
least 1 — £. 


The task of stably sorting N balls according to associated 1-bit keys can be implemented 
using a Turing machine in O(N) steps. Consider an input of the form (k1, v1), ..., (kN, un), 
where k; € {0,1} is a L-bit key and v; is the ith ball. The idea is to scan the input from 
the beginning and whenever we see an element (k;, vi) we do one of the following. If k; = 0, 
we write (ki, vi) to the next position in the output tape. If k; = 1, we write it to the next 
position in the work tape. After we finish scanning the input, we scan the output again and 
write all elements from first to last into the output tape. It is immediate that this algorithm 
is correct and has O(N) running time. < 
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‘A Sampling Noise on a Turing machine 


One of the operations our differentially oblivious Turing machine needs to do is to sample from 
the (continuous) Laplacian distribution Lap(); this is used in the algorithm for estimating the 
head’s location; see Section 3. There, we need to generate a sample from Lap((1 + log, N)/€) 
and we need to do this about N times. Recall that a Laplacian distribution is unbounded and 
samples need infinite precision. We show that with small tolerable loss in precision (which 
does not affect our final result), one can sample an approximation from this distribution on 
a standard Turing machine. 

We assume that In(1/e) is an integer so that we do not have rounding issues. Also, recall 
that ô is a negligible function of the form exp(— log? N)). First, we switch to a bounded 
version of the distribution, chopping off the tail which contains elements that occur with 
negligible probability. We can assume the we sample from the range +(log(V)/e)-poly log(1/6). 
Let us call dg the probability mass that we chopped off. Sampling from the bounded version 
turns our (e€, 6’)-differentially private prefix sum algorithm into an (e, 5)-differentially private 
one, where 6 = 0’+ N - (e° - do + ôo). To see this, observe that we have essentially N instances 
of the Lap noise and so by a simple union bound, the statistical distance between each event 
w.r.t the bounded distribution happens with probability at most N - ôo larger than in the 
unbounded version. Namely, for any set S, Pr[| Xbounded € S] < Pr[X E S] +N - ðo, where 
Xbounded iS the output of the mechanism when using bounded noise and X is the original 
mechanism. Then, by differential privacy, Pr[X € S] +N -o < ef Pr[Y € S] +0’ + Noo, 
where Y is another arbitrary event sampled from the unbounded noise version. Then again 
by bounding the noise used in Y, we get 


Pr[Xbounded = S] < e“ (Pr[Ybounded E S] + Nõo) + 5’ + Noo. 


Since we think of dg as being negligible in N and e being a constant, ô is also negligible.® 


6 The above analysis was very loose. In particular, one can do a tighter analysis and not lose the 
linear-in-N factor in 6 but for our purposes it does not matter since 6 is negligible in N anyway. 
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The next step is to represent each element in the bounded range with finite precision. 


We want to lose at most ô factor in precision (which will add another additive 6 factor to 
our additive error), and so if we use £ bits of precision, we have the inequality: 


2-* . ((log N)/e) - poly log(1/6) < 6. 


This means that it is enough to use £ € O(log((log N - log log(1/6))/(€d)) bits of precision 


which can be bounded by O(log?(1/5)) bits since ô is negligible in N and e€ is a constant. 


Therefore, all operations can be executed efficiently enough (in time O(poly log N)), which 
by slightly changing parameters (e.g., the value of p), does not affect our asymptotic upper 
bound on the running time of our differentially private Turing machine. 
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